Springboard Data Science Career Track Unit 4 Challenge - Tier 3 Complete¶

Objectives¶

Hey! Great job getting through those challenging DataCamp courses. You're learning a lot in a short span of time.

In this notebook, you're going to apply the skills you've been learning, bridging the gap between the controlled environment of DataCamp and the slightly messier work that data scientists do with actual datasets!

Here’s the mystery we’re going to solve: which boroughs of London have seen the greatest increase in housing prices, on average, over the last two decades?

A borough is just a fancy word for district. You may be familiar with the five boroughs of New York… well, there are 32 boroughs within Greater London (here's some info for the curious). Some of them are more desirable areas to live in, and the data will reflect that with a greater rise in housing prices.

This is the Tier 3 notebook, which means it's not filled in at all: we'll just give you the skeleton of a project, the brief and the data. It's up to you to play around with it and see what you can find out! Good luck! If you struggle, feel free to look at easier tiers for help; but try to dip in and out of them, as the more independent work you do, the better it is for your learning!

This challenge will make use of only what you learned in the following DataCamp courses:

Prework courses (Introduction to Python for Data Science, Intermediate Python for Data Science)
Data Types for Data Science
Python Data Science Toolbox (Part One)
pandas Foundations
Manipulating DataFrames with pandas
Merging DataFrames with pandas

Of the tools, techniques and concepts in the above DataCamp courses, this challenge should require the application of the following:

pandas
- data ingestion and inspection (pandas Foundations, Module One)
- exploratory data analysis (pandas Foundations, Module Two)
- tidying and cleaning (Manipulating DataFrames with pandas, Module Three)
- transforming DataFrames (Manipulating DataFrames with pandas, Module One)
- subsetting DataFrames with lists (Manipulating DataFrames with pandas, Module One)
- filtering DataFrames (Manipulating DataFrames with pandas, Module One)
- grouping data (Manipulating DataFrames with pandas, Module Four)
- melting data (Manipulating DataFrames with pandas, Module Three)
- advanced indexing (Manipulating DataFrames with pandas, Module Four)
matplotlib (Intermediate Python for Data Science, Module One)
fundamental data types (Data Types for Data Science, Module One)
dictionaries (Intermediate Python for Data Science, Module Two)
handling dates and times (Data Types for Data Science, Module Four)
function definition (Python Data Science Toolbox - Part One, Module One)
default arguments, variable length, and scope (Python Data Science Toolbox - Part One, Module Two)
lambda functions and error handling (Python Data Science Toolbox - Part One, Module Four)

The Data Science Pipeline¶

This is Tier Three, so we'll get you started. But after that, it's all in your hands! When you feel done with your investigations, look back over what you've accomplished, and prepare a quick presentation of your findings for the next mentor meeting.

Data Science is magical. In this case study, you'll get to apply some complex machine learning algorithms. But as David Spiegelhalter reminds us, there is no substitute for simply taking a really, really good look at the data. Sometimes, this is all we need to answer our question.

Data Science projects generally adhere to the four stages of Data Science Pipeline:

Sourcing and loading
Cleaning, transforming, and visualizing
Modeling
Evaluating and concluding

1. Sourcing and Loading¶

Any Data Science project kicks off by importing pandas. The documentation of this wonderful library can be found here. As you've seen, pandas is conveniently connected to the Numpy and Matplotlib libraries.

Hint: This part of the data science pipeline will test those skills you acquired in the pandas Foundations course, Module One.

1.1. Importing Libraries¶

# Let's import the pandas, numpy libraries as pd, and np respectively. 
import pandas as pd
import numpy as np

# Load the pyplot collection of functions from matplotlib, as plt 
import matplotlib.pyplot as plt

1.2. Loading the data¶

Your data comes from the London Datastore: a free, open-source data-sharing portal for London-oriented datasets.

# First, make a variable called url_LondonHousePrices, and assign it the following link, enclosed in quotation-marks as a string:
# https://data.london.gov.uk/download/uk-house-price-index/70ac0766-8902-4eb5-aab5-01951aaed773/UK%20House%20price%20index.xls

url_LondonHousePrices = "https://data.london.gov.uk/download/uk-house-price-index/70ac0766-8902-4eb5-aab5-01951aaed773/UK_House_price_index.xlsx"

# The dataset we're interested in contains the Average prices of the houses, and is actually on a particular sheet of the Excel file. 
# As a result, we need to specify the sheet name in the read_excel() method.
# Put this data into a variable called properties.  
properties = pd.read_excel(url_LondonHousePrices, sheet_name='Average price', index_col= None)

2. Cleaning, transforming, and visualizing¶

This second stage is arguably the most important part of any Data Science project. The first thing to do is take a proper look at the data. Cleaning forms the majority of this stage, and can be done both before or after Transformation.

The end goal of data cleaning is to have tidy data. When data is tidy:

Each variable has a column.
Each observation forms a row.

Keep the end goal in mind as you move through this process, every step will take you closer.

Hint: This part of the data science pipeline should test those skills you acquired in:

Intermediate Python for data science, all modules.
pandas Foundations, all modules.
Manipulating DataFrames with pandas, all modules.
Data Types for Data Science, Module Four.
Python Data Science Toolbox - Part One, all modules

2.1. Exploring your data

Think about your pandas functions for checking out a dataframe.

properties.head()

2.2. Cleaning the data

You might find you need to transpose your dataframe, check out what its row indexes are, and reset the index. You also might find you need to assign the values of the first row to your column headings . (Hint: recall the .columns feature of DataFrames, as well as the iloc[] method).

Don't be afraid to use StackOverflow for help with this.

properties_T = properties.transpose()

properties_T = properties_T.reset_index()
properties_T.head()

# properties_T.iloc[[0]]
# type(properties_T.iloc[0])
properties_T.iloc[0,:]

properties_T.columns = list(properties_T.iloc[0,:])
# properties_T.columns = list(properties_T.iloc[[0]])

properties_T = properties_T.loc[1:,:]

properties_T.head()

2.3. Cleaning the data (part 2)

You might we have to rename a couple columns. How do you do this? The clue's pretty bold...

properties_T = properties_T.rename(columns={"Unnamed: 0": "borough", pd.NaT: "id"})
properties_T.head()

2.4.Transforming the data

Remember what Wes McKinney said about tidy data?

You might need to melt your DataFrame here.

properties_T_melt = properties_T.melt(id_vars=['borough','id'])
properties_T_melt.head()

Remember to make sure your column data types are all correct. Average prices, for example, should be floating point numbers...

properties_T_rename = properties_T_melt.rename(columns = {'variable':'date','value':'average_price'})

properties_T_rename.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15216 entries, 0 to 15215
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   borough        15216 non-null  object        
 1   id             14265 non-null  object        
 2   date           15216 non-null  datetime64[ns]
 3   average_price  14265 non-null  object        
dtypes: datetime64[ns](1), object(3)
memory usage: 475.6+ KB

# This function converts stringer into integer.
def str2int(s):
    try:
        return int(s)
    except:
        return np.nan

properties_T_rename['average_price'] = properties_T_rename['average_price'].apply(lambda x: str2int(x))

2.5. Cleaning the data (part 3)

Do we have an equal number of observations in the ID, Average Price, Month, and London Borough columns? Remember that there are only 32 London Boroughs. How many entries do you have in that column?

Check out the contents of the London Borough column, and if you find null values, get rid of them however you see fit.

set(properties_T_rename['borough'].unique())

{'Barking & Dagenham',
 'Barnet',
 'Bexley',
 'Brent',
 'Bromley',
 'Camden',
 'City of London',
 'Croydon',
 'EAST MIDLANDS',
 'EAST OF ENGLAND',
 'Ealing',
 'Enfield',
 'England',
 'Greenwich',
 'Hackney',
 'Hammersmith & Fulham',
 'Haringey',
 'Harrow',
 'Havering',
 'Hillingdon',
 'Hounslow',
 'Inner London',
 'Islington',
 'Kensington & Chelsea',
 'Kingston upon Thames',
 'LONDON',
 'Lambeth',
 'Lewisham',
 'Merton',
 'NORTH EAST',
 'NORTH WEST',
 'Newham',
 'Outer London',
 'Redbridge',
 'Richmond upon Thames',
 'SOUTH EAST',
 'SOUTH WEST',
 'Southwark',
 'Sutton',
 'Tower Hamlets',
 'Unnamed: 34',
 'Unnamed: 37',
 'Unnamed: 47',
 'WEST MIDLANDS',
 'Waltham Forest',
 'Wandsworth',
 'Westminster',
 'YORKS & THE HUMBER'}

properties_T_rename.isnull().sum()

borough            0
id               951
date               0
average_price    951
dtype: int64

properties_T_rename[properties_T_rename['id'].isnull()]

NaNFree_properties_T_rename = properties_T_rename.dropna()

NaNFree_properties_T_rename.isnull().sum()

borough          0
id               0
date             0
average_price    0
dtype: int64

2.6. Visualizing the data

To visualize the data, why not subset on a particular London Borough? Maybe do a line plot of Month against Average Price?

set(NaNFree_properties_T_rename['borough'].unique())

{'Barking & Dagenham',
 'Barnet',
 'Bexley',
 'Brent',
 'Bromley',
 'Camden',
 'City of London',
 'Croydon',
 'EAST MIDLANDS',
 'EAST OF ENGLAND',
 'Ealing',
 'Enfield',
 'England',
 'Greenwich',
 'Hackney',
 'Hammersmith & Fulham',
 'Haringey',
 'Harrow',
 'Havering',
 'Hillingdon',
 'Hounslow',
 'Inner London',
 'Islington',
 'Kensington & Chelsea',
 'Kingston upon Thames',
 'LONDON',
 'Lambeth',
 'Lewisham',
 'Merton',
 'NORTH EAST',
 'NORTH WEST',
 'Newham',
 'Outer London',
 'Redbridge',
 'Richmond upon Thames',
 'SOUTH EAST',
 'SOUTH WEST',
 'Southwark',
 'Sutton',
 'Tower Hamlets',
 'WEST MIDLANDS',
 'Waltham Forest',
 'Wandsworth',
 'Westminster',
 'YORKS & THE HUMBER'}

subBrough = ['Barking & Dagenham',
 'Barnet',
 'Bexley',
 'Brent',
 'Bromley',
 'Camden',
#  'City of London',
 'Croydon',
#  'EAST MIDLANDS',
#  'EAST OF ENGLAND',
 'Ealing',
 'Enfield',
 'England',
 'Greenwich',
 'Hackney',
 'Hammersmith & Fulham',
 'Haringey',
 'Harrow',
 'Havering',
 'Hillingdon',
 'Hounslow',
#  'Inner London',
 'Islington',
 'Kensington & Chelsea',
 'Kingston upon Thames',
 'LONDON',
 'Lambeth',
 'Lewisham',
 'Merton',
#  'NORTH EAST',
#  'NORTH WEST',
 'Newham',
 'Outer London',
 'Redbridge',
 'Richmond upon Thames',
#  'SOUTH EAST',
#  'SOUTH WEST',
 'Southwark',
 'Sutton',
 'Tower Hamlets',
 'WEST MIDLANDS',
 'Waltham Forest',
 'Wandsworth',
 'Westminster',
 'YORKS & THE HUMBER']

subset_NaNFree_properties_T_rename = NaNFree_properties_T_rename[NaNFree_properties_T_rename['borough'].isin(subBrough)]
# subset_NaNFree_properties_T_rename.head()

To limit the number of data points you have, you might want to extract the year from every month value your Month column.

To this end, you could apply a lambda function. Your logic could work as follows:

look through the Month column
extract the year from each individual value in that column
store that corresponding year as separate column.

Whether you go ahead with this is up to you. Just so long as you answer our initial brief: which boroughs of London have seen the greatest house price increase, on average, over the past two decades?

subset_NaNFree_properties_T_rename['year']=subset_NaNFree_properties_T_rename['date'].apply(lambda x: x.year)
subset_NaNFree_properties_T_rename.head()

<ipython-input-22-c7103b470dff>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  subset_NaNFree_properties_T_rename['year']=subset_NaNFree_properties_T_rename['date'].apply(lambda x: x.year)

import cufflinks as cf

#We set the all charts as public
cf.set_config_file(sharing='public',theme='pearl',offline=False)
cf.go_offline()

subset_NaNFree_properties_T_rename.pivot_table(index='year', columns='borough', values='average_price',aggfunc=np.mean ).iplot(kind='scatter',xTitle='date',yTitle='borough',title='average price over years')

fig, ax = plt.subplots(figsize=(20, 15))
subset_NaNFree_properties_T_rename.pivot_table(index='year', columns='borough', values='average_price',aggfunc=np.mean ).plot(ax=ax)
ax.set_xlim(subset_NaNFree_properties_T_rename['year'].min(), subset_NaNFree_properties_T_rename['year'].max())
ax.set_ylim(0, subset_NaNFree_properties_T_rename['average_price'].max())
ax.ticklabel_format(style='plain')
plt.show()

3. Modeling

Consider creating a function that will calculate a ratio of house prices, comparing the price of a house in 2018 to the price in 1998.

Consider calling this function create_price_ratio.

You'd want this function to:

Take a filter of dfg, specifically where this filter constrains the London_Borough, as an argument. For example, one admissible argument should be: dfg[dfg['London_Borough']=='Camden'].
Get the Average Price for that Borough, for the years 1998 and 2018.
Calculate the ratio of the Average Price for 1998 divided by the Average Price for 2018.
Return that ratio.

Once you've written this function, you ultimately want to use it to iterate through all the unique London_Boroughs and work out the ratio capturing the difference of house prices between 1998 and 2018.

Bear in mind: you don't have to write a function like this if you don't want to. If you can solve the brief otherwise, then great!

Hint: This section should test the skills you acquired in:

Python Data Science Toolbox - Part One, all modules

dfg = subset_NaNFree_properties_T_rename.groupby(['borough','year']).mean()
dfg.reset_index().head()

ratio = (dfg.xs(2021,level='year')/dfg.xs(1998,level='year')).rename(columns={'average_price':'ratio'})
ratio.sort_values('ratio',ascending=False,inplace=True)
ratio.reset_index().head(5)

ratio.plot(kind='bar',y='ratio',figsize=(10,5),xlabel='Borough',ylabel='Price ratio(2021/1998)')

plt.show()

ratio.iplot(kind='bar',xTitle='Borough',yTitle='Price ratio(2018/1998)',title='Cufflinks - Bar Chart')

**if lambda function has to be used...¶

def create_price_ratio(df):
    return (df[df['year']==2018]['average_price']).values/(df[df['year']==1998]['average_price']).values

df[df['year']==2018]['average_price'].values

reset_dfg = dfg.reset_index()
reset_dfg.head()
# reset_dfg.info()

ratio={}
for name in list(reset_dfg.borough.unique()):
    ratio[name] = create_price_ratio(reset_dfg[reset_dfg['borough']==name])
ratio

{'Barking & Dagenham': array([4.89664706]),
 'Barnet': array([4.35820594]),
 'Bexley': array([4.24899905]),
 'Brent': array([4.89458596]),
 'Bromley': array([4.09479358]),
 'Camden': array([4.93536423]),
 'Croydon': array([4.20111391]),
 'Ealing': array([4.31146566]),
 'Enfield': array([4.26348447]),
 'England': array([3.81047519]),
 'Greenwich': array([4.76305948]),
 'Hackney': array([6.1983043]),
 'Hammersmith & Fulham': array([4.13780717]),
 'Haringey': array([5.13464921]),
 'Harrow': array([4.05920936]),
 'Havering': array([4.32524484]),
 'Hillingdon': array([4.20028803]),
 'Hounslow': array([3.97642502]),
 'Islington': array([4.84406149]),
 'Kensington & Chelsea': array([5.08247169]),
 'Kingston upon Thames': array([4.2705614]),
 'LONDON': array([4.67980038]),
 'Lambeth': array([4.95776824]),
 'Lewisham': array([5.44925656]),
 'Merton': array([4.7412939]),
 'Newham': array([5.30543022]),
 'Outer London': array([4.41896555]),
 'Redbridge': array([4.47119386]),
 'Richmond upon Thames': array([4.00516807]),
 'Southwark': array([5.51650489]),
 'Sutton': array([4.11854535]),
 'Tower Hamlets': array([4.62672185]),
 'WEST MIDLANDS': array([3.31128386]),
 'Waltham Forest': array([5.83478808]),
 'Wandsworth': array([4.75772672]),
 'Westminster': array([5.35357218]),
 'YORKS & THE HUMBER': array([3.35607992])}

pd_ratio = pd.DataFrame(ratio).T.rename(columns={0:'ratio'})
pd_ratio.sort_values('ratio').plot(kind='bar',xlabel='Borough',ylabel='Price ratio(2018/1998)',figsize=(15,5))
plt.show()

4. Conclusion¶

What can you conclude? Type out your conclusion below.

Look back at your notebook. Think about how you might summarize what you have done, and prepare a quick presentation on it to your mentor at your next meeting.

We hope you enjoyed this practical project. It should have consolidated your data hygiene and pandas skills by looking at a real-world problem involving just the kind of dataset you might encounter as a budding data scientist. Congratulations, and looking forward to seeing you at the next step in the course!

my answer¶

The graph shows the price ratio of houses in each borough, comparing the price of a house in 2018 to that in 1998.

Overall, all the rarios were positive. The median of the ratios is 4.5, the mean is 4.6, the IQR is 0.73, and the std is 0.63

The highest price ratio was about 6.2 in Hackney. This means that, the average price of a house in Hackney in 2018 was 6 times as expensive as the average price in 1998. In cnotrast, the lowest price ratio was 3.3 in WEST MIDLANDS. It can be prediced that the ratio of house price has strong correlation with borough.

When it comes to the changing rate of price, in the last year in the graph, some borough have positive trends, but the others have negatives.

In addition, as this graph indicates only average, there would be the other factors which affected the price or the changnig rate of price. These might be public transport, office, or facilities around the house.

gretest_price_2021 = dfg.xs(2021,level='year').reset_index().sort_values('average_price',ascending=False)
gretest_price_2021.head(5)

profit_ratio = ((dfg.xs(2021,level='year')/dfg.xs(1998,level='year'))**(1/26)-1).rename(columns={'average_price':'profit_ratio'})
profit_ratio.sort_values('profit_ratio',ascending=False,inplace=True)
profit_ratio.reset_index().head(30)

profit_ratio.reset_index().merge(profit_ratio.reset_index(),how='outer',on='borough')

	Unnamed: 0	City of London	Barking & Dagenham	Barnet	Bexley	Brent	Bromley	Camden	Croydon	Ealing	...	NORTH WEST	YORKS & THE HUMBER	EAST MIDLANDS	WEST MIDLANDS	EAST OF ENGLAND	LONDON	SOUTH EAST	SOUTH WEST	Unnamed: 47	England
0	NaT	E09000001	E09000002	E09000003	E09000004	E09000005	E09000006	E09000007	E09000008	E09000009	...	E12000002	E12000003	E12000004	E12000005	E12000006	E12000007	E12000008	E12000009	NaN	E92000001
1	1995-01-01	91448.98487	50460.2266	93284.51832	64958.09036	71306.56698	81671.47692	120932.8881	69158.16225	79885.89069	...	43958.48001	44803.42878	45544.52227	48527.52339	56701.5961	74435.76052	64018.87894	54705.1579	NaN	53202.77128
2	1995-02-01	82202.77314	51085.77983	93190.16963	64787.92069	72022.26197	81657.55944	119508.8622	68951.09542	80897.06551	...	43925.42289	44528.80721	46051.57066	49341.29029	56593.59475	72777.93709	63715.02399	54356.14843	NaN	53096.1549
3	1995-03-01	79120.70256	51268.96956	92247.52435	64367.49344	72015.76274	81449.31143	120282.2131	68712.44341	81379.86288	...	44434.8681	45200.46775	45383.82395	49442.17973	56171.18278	73896.84204	64113.60858	53583.07667	NaN	53201.2843
4	1995-04-01	77101.20804	53133.50526	90762.87492	64277.66881	72965.63094	81124.41227	120097.899	68610.04641	82188.90498	...	44267.7796	45614.34341	46124.23045	49455.93299	56567.89582	74455.28754	64623.22395	54786.01938	NaN	53590.8548

	index	0	1	2	3	4	5	6	7	8	...	308	309	310	311	312	313	314	315	316	317
0	Unnamed: 0	NaT	1995-01-01 00:00:00	1995-02-01 00:00:00	1995-03-01 00:00:00	1995-04-01 00:00:00	1995-05-01 00:00:00	1995-06-01 00:00:00	1995-07-01 00:00:00	1995-08-01 00:00:00	...	2020-08-01 00:00:00	2020-09-01 00:00:00	2020-10-01 00:00:00	2020-11-01 00:00:00	2020-12-01 00:00:00	2021-01-01 00:00:00	2021-02-01 00:00:00	2021-03-01 00:00:00	2021-04-01 00:00:00	2021-05-01 00:00:00
1	City of London	E09000001	91448.98487	82202.77314	79120.70256	77101.20804	84409.14932	94900.51244	110128.0423	112329.4376	...	827658.5167	798499.1646	836807.0518	769391.7847	801999.0466	760599.2322	729315.7243	717528.2633	713094.5391	765822.2726
2	Barking & Dagenham	E09000002	50460.2266	51085.77983	51268.96956	53133.50526	53042.24852	53700.34831	52113.12157	52232.19868	...	304923.5027	301702.4656	304852.1338	305012.3291	309560.1023	311007.1517	313835.8023	314187.4036	309680.5549	308760.1446
3	Barnet	E09000003	93284.51832	93190.16963	92247.52435	90762.87492	90258.00033	90107.23471	91441.24768	92361.31512	...	529660.1524	533619.23	530353.5205	528816.4928	532528.0926	538842.4015	538215.8867	542872.3895	545746.3543	546082.2634
4	Bexley	E09000004	64958.09036	64787.92069	64367.49344	64277.66881	63997.13588	64252.32335	63722.70055	64432.60005	...	346679.9244	345061.4945	345404.9836	346252.9145	352260.8385	356664.073	360116.1134	364480.9055	365519.4195	368554.8059

	Unnamed: 0	NaN	1995-01-01 00:00:00	1995-02-01 00:00:00	1995-03-01 00:00:00	1995-04-01 00:00:00	1995-05-01 00:00:00	1995-06-01 00:00:00	1995-07-01 00:00:00	1995-08-01 00:00:00	...	2020-08-01 00:00:00	2020-09-01 00:00:00	2020-10-01 00:00:00	2020-11-01 00:00:00	2020-12-01 00:00:00	2021-01-01 00:00:00	2021-02-01 00:00:00	2021-03-01 00:00:00	2021-04-01 00:00:00	2021-05-01 00:00:00
1	City of London	E09000001	91448.98487	82202.77314	79120.70256	77101.20804	84409.14932	94900.51244	110128.0423	112329.4376	...	827658.5167	798499.1646	836807.0518	769391.7847	801999.0466	760599.2322	729315.7243	717528.2633	713094.5391	765822.2726
2	Barking & Dagenham	E09000002	50460.2266	51085.77983	51268.96956	53133.50526	53042.24852	53700.34831	52113.12157	52232.19868	...	304923.5027	301702.4656	304852.1338	305012.3291	309560.1023	311007.1517	313835.8023	314187.4036	309680.5549	308760.1446
3	Barnet	E09000003	93284.51832	93190.16963	92247.52435	90762.87492	90258.00033	90107.23471	91441.24768	92361.31512	...	529660.1524	533619.23	530353.5205	528816.4928	532528.0926	538842.4015	538215.8867	542872.3895	545746.3543	546082.2634
4	Bexley	E09000004	64958.09036	64787.92069	64367.49344	64277.66881	63997.13588	64252.32335	63722.70055	64432.60005	...	346679.9244	345061.4945	345404.9836	346252.9145	352260.8385	356664.073	360116.1134	364480.9055	365519.4195	368554.8059
5	Brent	E09000005	71306.56698	72022.26197	72015.76274	72965.63094	73704.04743	74310.48167	74127.03788	73547.0411	...	497729.0327	519561.8277	522239.3216	513100.814	495066.6459	511347.8856	514388.7387	519212.9441	498083.9417	508570.7598

	borough	id	1995-01-01 00:00:00	1995-02-01 00:00:00	1995-03-01 00:00:00	1995-04-01 00:00:00	1995-05-01 00:00:00	1995-06-01 00:00:00	1995-07-01 00:00:00	1995-08-01 00:00:00	...	2020-08-01 00:00:00	2020-09-01 00:00:00	2020-10-01 00:00:00	2020-11-01 00:00:00	2020-12-01 00:00:00	2021-01-01 00:00:00	2021-02-01 00:00:00	2021-03-01 00:00:00	2021-04-01 00:00:00	2021-05-01 00:00:00
1	City of London	E09000001	91448.98487	82202.77314	79120.70256	77101.20804	84409.14932	94900.51244	110128.0423	112329.4376	...	827658.5167	798499.1646	836807.0518	769391.7847	801999.0466	760599.2322	729315.7243	717528.2633	713094.5391	765822.2726
2	Barking & Dagenham	E09000002	50460.2266	51085.77983	51268.96956	53133.50526	53042.24852	53700.34831	52113.12157	52232.19868	...	304923.5027	301702.4656	304852.1338	305012.3291	309560.1023	311007.1517	313835.8023	314187.4036	309680.5549	308760.1446
3	Barnet	E09000003	93284.51832	93190.16963	92247.52435	90762.87492	90258.00033	90107.23471	91441.24768	92361.31512	...	529660.1524	533619.23	530353.5205	528816.4928	532528.0926	538842.4015	538215.8867	542872.3895	545746.3543	546082.2634
4	Bexley	E09000004	64958.09036	64787.92069	64367.49344	64277.66881	63997.13588	64252.32335	63722.70055	64432.60005	...	346679.9244	345061.4945	345404.9836	346252.9145	352260.8385	356664.073	360116.1134	364480.9055	365519.4195	368554.8059
5	Brent	E09000005	71306.56698	72022.26197	72015.76274	72965.63094	73704.04743	74310.48167	74127.03788	73547.0411	...	497729.0327	519561.8277	522239.3216	513100.814	495066.6459	511347.8856	514388.7387	519212.9441	498083.9417	508570.7598

	borough	id	variable	value
0	City of London	E09000001	1995-01-01	91448.98487
1	Barking & Dagenham	E09000002	1995-01-01	50460.2266
2	Barnet	E09000003	1995-01-01	93284.51832
3	Bexley	E09000004	1995-01-01	64958.09036
4	Brent	E09000005	1995-01-01	71306.56698

	borough	id	date	average_price
33	Unnamed: 34	NaN	1995-01-01	NaN
36	Unnamed: 37	NaN	1995-01-01	NaN
46	Unnamed: 47	NaN	1995-01-01	NaN
81	Unnamed: 34	NaN	1995-02-01	NaN
84	Unnamed: 37	NaN	1995-02-01	NaN
...	...	...	...	...
15156	Unnamed: 37	NaN	2021-04-01	NaN
15166	Unnamed: 47	NaN	2021-04-01	NaN
15201	Unnamed: 34	NaN	2021-05-01	NaN
15204	Unnamed: 37	NaN	2021-05-01	NaN
15214	Unnamed: 47	NaN	2021-05-01	NaN

	borough	year	average_price
0	Barking & Dagenham	1995	51817.500000
1	Barking & Dagenham	1996	51717.833333
2	Barking & Dagenham	1997	55973.666667
3	Barking & Dagenham	1998	60285.333333
4	Barking & Dagenham	1999	65320.500000

	borough	ratio
0	Hackney	6.667049
1	Waltham Forest	6.383784
2	Lewisham	5.899919
3	Southwark	5.797158
4	Newham	5.736885

	borough	average_price
19	Kensington & Chelsea	1247367.0
35	Westminster	965211.2
5	Camden	809411.8
12	Hammersmith & Fulham	741961.2
28	Richmond upon Thames	699516.0

	borough	profit_ratio
0	Hackney	0.075696
1	Waltham Forest	0.073902
2	Lewisham	0.070651
3	Southwark	0.069928
4	Newham	0.069498
5	Haringey	0.066576
6	Lambeth	0.065516
7	Barking & Dagenham	0.065203
8	Merton	0.064686
9	Brent	0.064411
10	Westminster	0.064393
11	Islington	0.064052
12	Greenwich	0.063723
13	Tower Hamlets	0.063379
14	LONDON	0.062962
15	Wandsworth	0.062514
16	Redbridge	0.061924
17	Camden	0.061692
18	Outer London	0.060892
19	Kensington & Chelsea	0.060877
20	Enfield	0.059800
21	Ealing	0.059728
22	Havering	0.059590
23	Bexley	0.059580
24	Barnet	0.058939
25	Croydon	0.058534
26	Hillingdon	0.058532
27	Kingston upon Thames	0.058177
28	Hounslow	0.057962
29	Bromley	0.057734